In [1]:
%reload_ext watermark
%watermark -p pandas,networkx,numpy,matplotlib -v -n


Tue Sep 13 2016 

CPython 3.5.2
IPython 5.1.0

pandas 0.18.1
networkx 1.11
numpy 1.11.1
matplotlib 1.5.2

Creating Graphs in NetworkX

  • Creating a graph object
  • Adding nodes and edges
  • Adding attributes
  • Loading in several data types

In [2]:
import csv

import networkx as nx 
import pandas as pd 
import matplotlib.pyplot as plt 
%matplotlib inline

In [3]:
# Create empty graph
G = nx.Graph()

In [4]:
# Add nodes
G.add_node(1)
G.add_nodes_from([2, 3])
G.add_node(4)

In [5]:
G.nodes()


Out[5]:
[1, 2, 3, 4]

⚠️
Note: In networkx 2.0, several methods now return iterators

For more details see: https://networkx.github.io/documentation/development/reference/migration_guide_from_1.x_to_2.0.html


In [6]:
# add edges

G.add_edge(1, 2)

In [7]:
# get graph info
print(nx.info(G))


Name: 
Type: Graph
Number of nodes: 4
Number of edges: 1
Average degree:   0.5000

In [8]:
nx.draw(G, with_labels=True)


Adding and Inspecting Attributes


In [9]:
# add at creation
# nodes
G.add_node(5, favorite_color='blue')

G.add_nodes_from([(6, {'favorite_color' : 'red'}),
                  (7, {'favorite_color' :'purple'})])

# edges
G.add_edge(5, 6, {'relationship' : 'best friends'})

In [10]:
# accessing node attributes
print("Node 5 attributes:", G.node[5])

# accessing edge attributes
print("Edge 5-6 attributes:", G.edge[5][6])


Node 5 attributes: {'favorite_color': 'blue'}
Edge 5-6 attributes: {'relationship': 'best friends'}

Adding Attributes for each existing node


In [11]:
favorite_foods = {
    1 : 'pizza',
    2 : 'mac and cheese',
    3 : 'balogna sandwich',
    4 : 'pizza',
    5 : 'chocolate',
    6 : 'pizza',
    7 : 'bananas'
}

In [12]:
nx.set_node_attributes(G, 'favorite_food', favorite_foods)

In [13]:
print("Node 4's favorite food is %s" % G.node[4]['favorite_food'])


Node 4's favorite food is pizza

Reading in Different Representations of Graphs

Data for graphs and networks comes in many different representations.

Representations:

  • Edge List
  • Adjacency Matrix
  • Adjacency List (not covered)
  • Incidence Matrix (not covered)

Note: Representations are related to, but distinct from, the storage format. In our examples, we'll be loading our data from text files. You may also have network data stored as JSON, GEXF, or other formats. For more details, check the docs.

Grey's Anatomy Dataset

The dataset we'll look at is a record of all "romantic" encounters between characters on the TV show Grey's Anatomy.

Edge Lists

An edge list is a common way of representing a graph. This representation can be thought of as a list of tuples, where each tuple represents an edge between two of the nodes in your graph. The nodes of the graph can be inferred by taking the set of objects from all tuples.

You can infer/determine whether a graph is directed or weighted from an edge list.

  • Weighted: If edges appear more than once, or if an additional weight attribute is added as a 3rd column, the graph is weighted
  • Directed: If the "From" and "To" (often seen as "Source" and "Target") of an edge in the list is not arbitrary, it's a directed graph

In [14]:
# what does it look like? 
!head ../data/ga_edgelist.csv


"from","to"
"lexi","sloan"
"lexi","karev"
"owen","yang"
"owen","altman"
"sloan","torres"
"sloan","altman"
"torres","arizona"
"torres","karev"
"derek","grey"

In [15]:
edges = []
with open('../data/ga_edgelist.csv', 'r') as f:
    filereader = csv.reader(f, delimiter=",", quotechar='"')
    next(filereader) # skips header row
    for row in filereader:
            edges.append(row)

In [16]:
edges[0:5]


Out[16]:
[['lexi', 'sloan'],
 ['lexi', 'karev'],
 ['owen', 'yang'],
 ['owen', 'altman'],
 ['sloan', 'torres']]

In [17]:
GA = nx.from_edgelist(edges)

In [18]:
print(nx.info(GA))


Name: 
Type: Graph
Number of nodes: 32
Number of edges: 34
Average degree:   2.1250

Mediating Data Processing through pandas

Often times the data we'll want to use will probably be analyzed beforehand with pandas. Reading in our data to a DataFrame first saves us a bit of time writng code to open the files due to read_csv having sensible defaults around quoted characters and header rows.


In [19]:
ga_edges = pd.read_csv('../data/ga_edgelist.csv')

ga_edges.head()


Out[19]:
from to
0 lexi sloan
1 lexi karev
2 owen yang
3 owen altman
4 sloan torres

In [20]:
GA = nx.from_pandas_dataframe(ga_edges, source="from", target="to")

In [21]:
# validate info

print(nx.info(GA))


Name: 
Type: Graph
Number of nodes: 32
Number of edges: 34
Average degree:   2.1250

In [22]:
nx.draw(GA, with_labels=True)


Adjacency Matrices

A common way of representing graph data is through an adjacency matrix -- often referred to mathematically as A. This data structure is a square, n x n matrix where n = number of nodes. Each column and row in the matrix is a node. For any two nodes, i and j the value at Aij (row i and column j) represents the weight of the edge between nodes i and j.


In [23]:
ga_adj = pd.read_csv('../data/ga_adj.csv', index_col=0)

ga_adj.ix[0:5, 0:5]


Out[23]:
denny kepner grey colin finn
denny 0.0 0.0 0.0 0.0 0.0
kepner 0.0 0.0 0.0 0.0 0.0
grey 0.0 0.0 0.0 0.0 1.0
colin 0.0 0.0 0.0 0.0 0.0
finn 0.0 0.0 1.0 0.0 0.0

In [24]:
GAAdj = nx.from_numpy_matrix(ga_adj.values)

In [25]:
# Numpy matrices don't have labels :(
print(GAAdj.nodes())


[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]

In [26]:
label_mapping = dict(zip(GAAdj.nodes(), ga_adj.columns))

GAAdj = nx.relabel_nodes(GAAdj, label_mapping)

In [27]:
nx.draw_spring(GAAdj, with_labels=True)


Are the two graphs the same?


In [28]:
# Easiest, least robust way:
print("Edge List Graph\n", nx.info(GA))
print("\nAdj. Matrix Graph\n", nx.info(GAAdj))


Edge List Graph
 Name: 
Type: Graph
Number of nodes: 32
Number of edges: 34
Average degree:   2.1250

Adj. Matrix Graph
 Name: ()
Type: Graph
Number of nodes: 32
Number of edges: 34
Average degree:   2.1250

In [29]:
# Fancy math way that checks additional conditions
print("Isomorphic?", nx.is_isomorphic(GA, GAAdj))


Isomorphic? True

Gotchas


In [30]:
print("'denny' From Edge List Graph:", GA['denny'])
print("'denny' From Adjacency Matrix Graph:", GAAdj['denny'])


'denny' From Edge List Graph: {'izzie': {}}
'denny' From Adjacency Matrix Graph: {'izzie': {'weight': 1.0}}

⚠️ Observation: Edge weights are inferred from adjacency matrix


In [31]:
original_edgelist = sorted(nx.to_edgelist(GA))
adjacency_edgelist = sorted(nx.to_edgelist(GAAdj))
for i, edge in enumerate(original_edgelist):
    adjacency_edge = adjacency_edgelist[i]
    if edge[0] != adjacency_edge[0]:
        print("Sorted Edge Mismatch at edge %s:" % i, edge, adjacency_edge)
        break


Sorted Edge Mismatch at edge 1: ('addison', 'karev', {}) ('adele', 'chief', {'weight': 1.0})

⚠️ Observation: Source and Target are ambiguously defined in undirected graphs

Exporting Graphs

We'll export the graph in GEXF (Graph Exchange XML Format).


In [32]:
nx.write_gexf(GA, '../data/ga_graph.gexf')